60 research outputs found

    Using collocation segmentation to augment the phrase table

    Get PDF
    This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.Postprint (published version

    UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

    Get PDF
    This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.Postprint (published version

    Improving statistical machine translation through adaptation and learning

    Get PDF
    With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributions made to the SMT field that can be found in this Ph.D. thesis are: - We defined a similarity function that compares an MT system output with a translation reference for that output and align the errors made by the system with the correct translations found in the reference. This information is then used to compute an alignment between the original input sentence and the reference. - We defined a method to perform domain adaptation based on the alignment mentioned before. Using this alignment with an in-domain parallel corpus, we extract new translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the correct translations found in the reference. These new units are then scored and combined with the units in the original system in order to improve its quality in terms of both human an automatic metrics. - We succesfully applied the method in a new task: to improve a SMT translation quality using post-editions provided by real users of the system. In this case, the alignment was computed over a parallel corpus build with post-editions, extracting translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the corrections found in the feedback provided. - The method proposed in this dissertation is able to achieve significant improvements in translation quality with a small learning material, corresponding to a 0.5% of the training material used to build the original system. Results from our evaluations also indicate that the improvement achieved with the domain adaptation strategy is measurable by both automatic a human-based evaluation metrics.Esta tesis propone un nuevo método para mejorar un sistema de Traducción Automática Estadística (SMT por sus siglas en inglés) utilizando post-ediciones de sus traducciones automáticas. La estrategia puede asociarse con la adaptación de dominio, considerando las post-ediciones obtenidas a través de usuarios reales del sistema de traducción como el material del dominio a adaptar. El método compara las post-ediciones con las traducciones automáticas con la finalidad de detectar automáticamente los lugares en los que el traductor cometió algún error, para poder aprender de ello. Una vez los errores han sido detectados se realiza un alineado a nivel de palabras entre las oraciones originales y las postediciones, para extraer unidades de traducción que son luego incorporadas al sistema base de manera que se corrijan los errores en futuras traducciones. Nuestros resultados muestran mejoras estadísticamente significativas a partir de un conjunto de datos que representa en tamaño un 0, 5% del material utilizado durante el entrenamiento. Junto con las medidas automáticas de calidad, también presentamos un análisis cualitativo del sistema para validar los resultados. Las mejoras en la traducción se observan en su mayoría en el léxico y el reordenamiento de palabras, seguido de correcciones morfológicas. La estrategia, que introduce los conceptos de corpus aumentado, función de similaridad y unidades de traducción derivadas, es probada con dos paradigmas de SMT (traducción basada en N-gramas y en frases), con dos pares de lengua (Catalán-Español e Inglés-Español) y en diferentes escenarios de adaptación de dominio, incluyendo un dominio abierto en el cual el sistema fue adaptado a través de peticiones recogidas por usuarios reales a través de internet, obteniendo resultados similares durante todas las pruebas. Los resultados de esta investigación forman parte del projecto FAUST (en inglés, Feedback Analysis for User adaptive Statistical Translation), un proyecto del Séptimo Programa Marco de la Comisión Europea

    CO2 Transoral Laser Microsurgery in Benign, Premalignant and Malignant (Tis, T1, T2) Lesion of the Glottis. A Literature Review

    Get PDF
    Carbon Dioxide transoral laser microsurgery represents a reliable option for the treatment of early glottic carcinoma (Tis-T2), with good functional and oncological outcomes, nowadays representing one of the main options in larynx preservation protocols. The development and improvement of laser devices means surgeons are able to use more precise instruments compared with classic cold dissection in laser-assisted phonosurgery. Secondary effects on voice, swallowing, or quality of life as well as complications have been well documented. Also, with the introduction of a new proposal for staging systems following the principle of the three-dimensional map of isoprognostic zones, the use of narrow-band imaging in clinical evaluation and intraoperative, and the implementation of diffusion-weighted magnetic resonance during preoperative evaluation, the development of new tools to improve surgical quality and preliminary reports regarding the use of carbon dioxide laser in transoral robotic surgery suggests an exciting future for this technique

    Leveraging online user feedback to improve statistical machine translation

    Get PDF
    In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.Peer ReviewedPostprint (author's final draft

    Insulin-like growth factor 2 (IGF2) protects against Huntington's disease through the extracellular disposal of protein aggregates

    Get PDF
    Impaired neuronal proteostasis is a salient feature of many neurodegenerative diseases, highlighting alterations in the function of the endoplasmic reticulum (ER). We previously reported that targeting the transcription factor XBP1, a key mediator of the ER stress response, delays disease progression and reduces protein aggregation in various models of neurodegeneration. To identify disease modifier genes that may explain the neuroprotective effects of XBP1 deficiency, we performed gene expression profiling of brain cortex and striatum of these animals and uncovered insulin-like growth factor 2 (Igf2) as the major upregulated gene. Here, we studied the impact of IGF2 signaling on protein aggregation in models of Huntington's disease (HD) as proof of concept. Cell culture studies revealed that IGF2 treatment decreases the load of intracellular aggregates of mutant huntingtin and a polyglutamine peptide. These results were validated using induced pluripotent stem cells (iPSC)-derived medium spiny neurons from HD patients and spinocerebellar ataxia cases. The reduction in the levels of mutant huntingtin was associated with a decrease in the half-life of the intracellular protein. The decrease in the levels of abnormal protein aggregation triggered by IGF2 was independent of the activity of autophagy and the proteasome pathways, the two main routes for mutant huntingtin clearance. Conversely, IGF2 signaling enhanced the secretion of soluble mutant huntingtin species through exosomes and microvesicles involving changes in actin dynamics. Administration of IGF2 into the brain of HD mice using gene therapy led to a significant decrease in the levels of mutant huntingtin in three different animal models. Moreover, analysis of human postmortem brain tissue and blood samples from HD patients showed a reduction in IGF2 level. This study identifies IGF2 as a relevant factor deregulated in HD, operating as a disease modifier that buffers the accumulation of abnormal protein species

    ADGRL3 (LPHN3) variants predict substance use disorder

    Get PDF
    Genetic factors are strongly implicated in the susceptibility to develop externalizing syndromes such as attention-deficit/hyperactivity disorder (ADHD), oppositional defiant disorder, conduct disorder, and substance use disorder (SUD). Variants in the ADGRL3 (LPHN3) gene predispose to ADHD and predict ADHD severity, disruptive behaviors comorbidity, long-term outcome, and response to treatment. In this study, we investigated whether variants within ADGRL3 are associated with SUD, a disorder that is frequently co-morbid with ADHD. Using family-based, case-control, and longitudinal samples from disparate regions of the world (n = 2698), recruited either for clinical, genetic epidemiological or pharmacogenomic studies of ADHD, we assembled recursive-partitioning frameworks (classification tree analyses) with clinical, demographic, and ADGRL3 genetic information to predict SUD susceptibility. Our results indicate that SUD can be efficiently and robustly predicted in ADHD participants. The genetic models used remained highly efficient in predicting SUD in a large sample of individuals with severe SUD from a psychiatric institution that were not ascertained on the basis of ADHD diagnosis, thus identifying ADGRL3 as a risk gene for SUD. Recursive-partitioning analyses revealed that rs4860437 was the predominant predictive variant. This new methodological approach offers novel insights into higher order predictive interactions and offers a unique opportunity for translational application in the clinical assessment of patients at high risk for SUD

    Single nucleotide polymorphisms in DNA repair genes as risk factors associated to prostate cancer progression

    Get PDF
    Background Besides serum levels of PSA, there is a lack of prostate cancer specific biomarkers. It is need to develop new biological markers associated with the tumor behavior which would be valuable to better individualize treatment. The aim of this study was to elucidate the relationship between single nucleotide polymorphisms (SNPs) in genes involved in DNA repair and prostate cancer progression.Methods A total of 494 prostate cancer patients from a Spanish multicenter study were genotyped for 10 SNPs in XRCC1, ERCC2, ERCC1, LIG4, ATM and TP53 genes. The SNP genotyping was made in a Biotrove OpenArray® NT Cycler. Clinical tumor stage, diagnostic PSA serum levels, and Gleason score at diagnosis were obtained for all participants. Genotypic and allelic frequencies were determined using the web-based environment SNPator.Results SNPs rs11615 (ERCC1) and rs17503908 (ATM) appeared as risk factors for prostate cancer aggressiveness. Patients wild homozygous for these SNPs (AA and TT, respectively) were at higher risk for developing cT2b – cT4 (OR = 2.21 (confidence interval (CI) 95% 1.47 – 3.31), p < 0.001) and Gleason scores ≥ 7 (OR = 2.22 (CI 95% 1.38 – 3.57), p < 0.001), respectively. Moreover, those patients wild homozygous for both SNPs had the greatest risk of presenting D’Amico high-risk tumors (OR = 2.57 (CI 95% 1.28 – 5.16)).Conclusions Genetic variants at DNA repair genes are associated with prostate cancer progression, and would be taken into account when assessing the malignancy of prostate cancer.This work was subsidized by a grant from the Instituto de Salud Carlos III (Ministerio de Economía y Competitividad from Spain), ID: PI12/01867. Almudena Valenciano has a grant from the Instituto Canario de Investigación del Cáncer (ICIC)
    corecore